AITopics

arXiv.org Artificial IntelligenceSep-29-2025

ArabJobs: A Multinational Corpus of Arabic Job Ads

El-Haj, Mo

ArabJobs is a publicly available corpus of Arabic job advertisements collected from Egypt, Jordan, Saudi Arabia, and the United Arab Emirates. Comprising over 8,500 postings and more than 550,000 words, the dataset captures linguistic, regional, and socio-economic variation in the Arab labour market. We present analyses of gender representation and occupational structure, and highlight dialectal variation across ads, which offers opportunities for future research. We also demonstrate applications such as salary estimation and job category normalisation using large language models, alongside benchmark tasks for gender bias detection and profession classification. The findings show the utility of ArabJobs for fairness-aware Arabic NLP and labour market research. The dataset is publicly available on GitHub: https://github.com/drelhaj/ArabJobs.

large language model, machine learning, natural language, (19 more...)

2509.22589

Country: Asia > Middle East > UAE (0.67)

Genre: Research Report > New Finding (0.66)

Industry:

Marketing (0.89)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Neural Information Processing SystemsAug-14-2025, 08:35:49 GMT

Appendix A Double

Inspired by the recent work [Dieng et al., 2019] that elucidates how skip-connections promote higher latent information Double-LSTM consists of two LSTM units [Hochre-iter and Schmidhuber] as depicted in Figure 5. I would n't go out of my way to come here . I've been here a few times and it's always been good . I've been here a few times and it's always been good . Great place to go for a quick bite to eat .

dataset, double-lstm, information, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.16)
Asia > Middle East > Iraq (0.05)

Industry: Leisure & Entertainment > Sports (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Chaudhuri, Ritwik, C, Rajmohan, DB, Kirushikesh, Agarwal, Arvind

Automated Question Generation on Tabular Data for Conversational Data Exploration

arXiv.org Artificial IntelligenceJul-10-2024

Exploratory data analysis (EDA) is an essential step for analyzing a dataset to derive insights. Several EDA techniques have been explored in the literature. Many of them leverage visualizations through various plots. But it is not easy to interpret them for a non-technical user, and producing appropriate visualizations is also tough when there are a large number of columns. Few other works provide a view of some interesting slices of data but it is still difficult for the user to draw relevant insights from them. Of late, conversational data exploration is gaining a lot of traction among non-technical users. It helps the user to explore the dataset without having deep technical knowledge about the data. Towards this, we propose a system that recommends interesting questions in natural language based on relevant slices of a dataset in a conversational setting. Specifically, given a dataset, we pick a select set of interesting columns and identify interesting slices of such columns and column combinations based on few interestingness measures. We use our own fine-tuned variation of a pre-trained language model(T5) to generate natural language questions in a specific manner. We then slot-fill values in the generated questions and rank them for recommendations. We show the utility of our proposed system in a coversational setting with a collection of real datasets.

dataset, operator, salary, (13 more...)

2407.12859

Country:

North America > United States > New York (0.05)
Asia > India (0.05)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.67)

arXiv.org Artificial IntelligenceOct-11-2023

TabLib: A Dataset of 627M Tables with Context

Eggert, Gus, Huo, Kevin, Biven, Mike, Waugh, Justin

It is well-established that large, diverse datasets play a pivotal role in the performance of modern AI systems for text and image modalities. However, there are no datasets for tabular data of comparable size and diversity to those available for text and images. Thus we present "TabLib'', a compilation of 627 million tables totaling 69 TiB, along with 867B tokens of context. TabLib was extracted from numerous file formats, including CSV, HTML, SQLite, PDF, Excel, and others, sourced from GitHub and Common Crawl. The size and diversity of TabLib offer considerable promise in the table modality, reminiscent of the original promise of foundational datasets for text and images, such as The Pile and LAION.

arxiv, metadata, tablib, (14 more...)

2310.07875

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Oregon (0.04)
(9 more...)

Genre: Research Report (0.67)

Technology:

Information Technology > Software (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(5 more...)

Wang, Yanchen, Singh, Lisa

Adding guardrails to advanced chatbots

arXiv.org Artificial IntelligenceJun-12-2023

Generative AI models continue to become more powerful. The launch of ChatGPT in November 2022 has ushered in a new era of AI. ChatGPT and other similar chatbots have a range of capabilities, from answering student homework questions to creating music and art. There are already concerns that humans may be replaced by chatbots for a variety of jobs. Because of the wide spectrum of data chatbots are built on, we know that they will have human errors and human biases built into them. These biases may cause significant harm and/or inequity toward different subpopulations. To understand the strengths and weakness of chatbot responses, we present a position paper that explores different use cases of ChatGPT to determine the types of questions that are answered fairly and the types that still need improvement. We find that ChatGPT is a fair search engine for the tasks we tested; however, it has biases on both text generation and code generation. We find that ChatGPT is very sensitive to changes in the prompt, where small changes lead to different levels of fairness. This suggests that we need to immediately implement "corrections" or mitigation strategies in order to improve fairness of these systems. We suggest different strategies to improve chatbots and also advocate for an impartial review panel that has access to the model parameters to measure the levels of different types of biases and then recommends safeguards that move toward responses that are less discriminatory and more accurate.

large language model, machine learning, natural language, (17 more...)

2306.075

Country:

Europe (0.14)
North America > United States > District of Columbia > Washington (0.04)

Genre:

Research Report (1.00)
Personal > Interview (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Banking & Finance (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

#artificialintelligenceFeb-18-2023, 06:50:08 GMT

How to Become a Machine Learning Engineer in 2023? [Step-by-Step]

After gaining Python and Machine Learning knowledge, it's time to practice. And for that, you need to use Data Science tools like Jupyter and Anaconda. Spend your few hours and play with these tools. Understand what they're for and why you should use them.

engineer, learning engineer, machine learning engineer, (13 more...)

Genre: Instructional Material (0.48)

Industry:

Education > Educational Setting > Online (0.35)
Education > Educational Technology > Educational Software > Computer Based Training (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceOct-3-2022, 16:18:43 GMT

Data Science Mega-Course: #Build {120-Projects In 120-Days}

According to Glassdoor, the average salary for a Data Scientist is $117,345/yr. This is above the national average of $44,564. Therefore, a Data Scientist makes 163% more than the national average salary. This makes Data Science a highly lucrative career choice. It is mainly due to the dearth of Data Scientists resulting in a huge income bubble. Since Data Science requires a person to be proficient and knowledgeable in several fields like Statistics, Mathematics, and Computer Science, the learning curve is quite steep.

data science, data science mega-course, scientist, (7 more...)

Industry: Health & Medicine (0.36)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.79)
Information Technology > Artificial Intelligence > Natural Language (0.56)

#artificialintelligenceJun-29-2022, 21:07:27 GMT

Data Science & Artificial Intelligence in Demand Pakistan

Pakistan's economy is ranked 26th in the world in terms of purchasing power parity (PPP) and 40th in terms of nominal gross domestic product. Pakistan has a population of approximately 190 million people, making it the world's sixth-largest country, with a nominal GDP per capita of $1,427, ranking 133rd globally. The key sectors of the Pakistani economy are agriculture, mining, industry, automotive, construction, defence, services, and transportation. The IT/ITeS sector is one of Pakistan's fastest-growing industries, accounting for around 1% of the country's GDP ($3.5 billion USD). It has doubled in the last four years, and experts predict a further 100% increase to $7 billion in the next two to four years.

data scientist, pakistan, salary, (10 more...)

Country:

North America > United States (0.15)
Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.07)
Asia > India (0.05)
Asia > Bangladesh (0.05)

Genre: Instructional Material > Course Syllabus & Notes (0.31)

Industry:

Government (0.98)
Banking & Finance > Economy (0.70)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining (0.32)

#artificialintelligenceJun-29-2022, 13:13:22 GMT

Data Science & Artificial Intelligence Demand in Indonesia

Indonesia is a country with enormous economic potential, which has not gone unnoticed by the international community. Indonesia, the largest economy in Southeast Asia, possesses a variety of qualities that position the country well for advanced economic development. Furthermore, the central government has recently shown significant support for reducing Indonesia's historic reliance on (raw) commodity exports while increasing the importance of the manufacturing industry in the economy. Looking ahead, Indonesia's information and communication technology (ICT) sector have a promising future, since the country is only getting started with IT solutions adoption, leaving plenty of space for expansion. Much of Indonesia's potential has yet to be realised, making the country one of the most promising ICT markets in the future years.

data scientist, indonesia, scientist, (9 more...)

Country:

Asia > Southeast Asia (0.25)
Asia > Indonesia > Java > Jakarta > Jakarta (0.07)

Genre: Instructional Material > Course Syllabus & Notes (0.50)

Industry:

Information Technology (0.70)
Banking & Finance > Economy (0.50)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.33)